Search CORE

2 research outputs found

DiME: Maximizing Mutual Information by a Difference of Matrix-Based Entropies

Author: Brockmeier Austin J.
Giraldo Luis Gonzalo Sanchez
Osorio Jhoan Keider Hoyos
Skean Oscar
Publication venue
Publication date: 26/05/2023
Field of study

We introduce an information-theoretic quantity with similar properties to mutual information that can be estimated from data without making explicit assumptions on the underlying distribution. This quantity is based on a recently proposed matrix-based entropy that uses the eigenvalues of a normalized Gram matrix to compute an estimate of the eigenvalues of an uncentered covariance operator in a reproducing kernel Hilbert space. We show that a difference of matrix-based entropies (DiME) is well suited for problems involving the maximization of mutual information between random variables. While many methods for such tasks can lead to trivial solutions, DiME naturally penalizes such outcomes. We compare DiME to several baseline estimators of mutual information on a toy Gaussian dataset. We provide examples of use cases for DiME, such as latent factor disentanglement and a multiview representation learning problem where DiME is used to learn a shared representation among views with high mutual information

arXiv.org e-Print Archive

The Representation Jensen-R\'enyi Divergence

Author: Brockmeier Austin J.
Giraldo Luis Gonzalo Sanchez
Osorio Jhoan Keider Hoyos
Skean Oscar
Publication venue
Publication date: 01/06/2022
Field of study

We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite Gram matrices that are obtained by evaluating the kernel over pairs of data points. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the ordered spectrum of the Gram matrices and the integral operators associated with the population quantities. The proposed measure of divergence avoids the estimation of the probability distribution underlying the data. Numerical experiments involving comparing distributions and applications to sampling unbalanced data for classification show that the proposed divergence can achieve state of the art results.Comment: We added acknowledgment

arXiv.org e-Print Archive